
Ethan Collins
Pattern Recognition Specialist

Amazon Web Services (AWS) Web Application Firewall (WAF) is a powerful security service that helps protect web applications from common web exploits that could affect availability, compromise security, or consume excessive resources. While crucial for safeguarding web assets, AWS WAF can present a significant challenge for automated web scraping and data extraction processes, often blocking legitimate crawlers.
This article provides a comprehensive guide on how to seamlessly integrate Crawl4AI, an advanced web crawler, with CapSolver, a leading CAPTCHA and anti-bot solution service, to effectively solve AWS WAF protections. We will detail the API-based and extension integration method, including code examples and explanations, to ensure your web automation tasks can proceed without interruption.
AWS WAF operates by monitoring HTTP(S) requests that are forwarded to an Amazon CloudFront distribution, an Application Load Balancer, an Amazon API Gateway, or an AWS AppSync GraphQL API. It allows you to configure rules that block common attack patterns, such as SQL injection or cross-site scripting, and can also filter traffic based on IP addresses, HTTP headers, HTTP body, or URI strings. For web crawlers, this often means:
CapSolver provides a robust solution for obtaining the necessary aws-waf-token cookie, which is key to bypassing AWS WAF. When integrated with Crawl4AI, this allows your crawlers to mimic legitimate user behavior and navigate protected sites successfully.
💡 Exclusive Bonus for Crawl4AI Integration Users:
To celebrate this integration, we’re offering an exclusive 6% bonus code —CRAWL4for all CapSolver users who register through this tutorial.
Simply enter the code during recharge in Dashboard to receive an extra 6% credit instantly.
The most effective way to handle AWS WAF challenges with Crawl4AI and CapSolver is through API integration. This method involves using CapSolver to obtain the required aws-waf-token and then injecting this token as a cookie into Crawl4AI's browser context before reloading the target page.
aws-waf-token.AntiAwsWafTaskProxyLess type along with the websiteURL. CapSolver will return the necessary aws-waf-token cookie.js_code functionality to set the obtained aws-waf-token as a cookie in the browser's context. After setting the cookie, the page is reloaded.aws-waf-token cookie in place, Crawl4AI can now successfully access the protected page and continue with its data extraction tasks.The following Python code demonstrates how to integrate CapSolver's API with Crawl4AI to solve AWS WAF challenges. This example targets a Porsche NFT onboarding page protected by AWS WAF.
import asyncio
import capsolver
from crawl4ai import *
# TODO: set your config
# Docs: https://docs.capsolver.com/guide/captcha/awsWaf/
api_key = "CAP-xxxxxxxxxxxxxxxxxxxxx" # your api key of capsolver
site_url = "https://nft.porsche.com/onboarding@6" # page url of your target site
cookie_domain = ".nft.porsche.com" # the domain name to which you want to apply the cookie
captcha_type = "AntiAwsWafTaskProxyLess" # type of your target captcha
capsolver.api_key = api_key
async def main():
browser_config = BrowserConfig(
verbose=True,
headless=False,
use_persistent_context=True,
)
async with AsyncWebCrawler(config=browser_config) as crawler:
await crawler.arun(
url=site_url,
cache_mode=CacheMode.BYPASS,
session_id="session_captcha_test"
)
# get aws waf cookie using capsolver sdk
solution = capsolver.solve({
"type": captcha_type,
"websiteURL": site_url,
})
cookie = solution["cookie"]
print("aws waf cookie:", cookie)
js_code = """
document.cookie = \'aws-waf-token=""" + cookie + """;domain=""" + cookie_domain + """;path=/\';
location.reload();
"""
wait_condition = """() => {
return document.title === \'Join Porsche’s journey into Web3\';
}"""
run_config = CrawlerRunConfig(
cache_mode=CacheMode.BYPASS,
session_id="session_captcha_test",
js_code=js_code,
js_only=True,
wait_for=f"js:{wait_condition}"
)
result_next = await crawler.arun(
url=site_url,
config=run_config,
)
print(result_next.markdown)
if __name__ == "__main__":
asyncio.run(main())
Code Analysis:
capsolver.solve method is invoked with AntiAwsWafTaskProxyLess type and websiteURL to retrieve the aws-waf-token cookie. This is the crucial step where CapSolver's AI solves the WAF challenge and provides the necessary cookie.js_code): The js_code string contains JavaScript that sets the aws-waf-token cookie in the browser's context using document.cookie. It then triggers location.reload() to refresh the page, ensuring that the subsequent request includes the newly set valid cookie.wait_for Condition: A wait_condition is defined to ensure Crawl4AI waits for the page title to match 'Join Porsche’s journey into Web3', indicating that the WAF has been successfully bypassed and the intended content is loaded.CapSolver's browser extension provides a simplified approach for handling AWS WAF challenges, especially when leveraging its automatic solving capabilities within a persistent browser context managed by Crawl4AI.
user_data_dir to launch a browser instance that retains the installed CapSolver extension and its configurations.aws-waf-token cookie. This cookie is then automatically applied to subsequent requests.This example demonstrates how Crawl4AI can be configured to use a browser profile with the CapSolver extension for automatic AWS WAF solving.
import asyncio
import time
from crawl4ai import *
# TODO: set your config
user_data_dir = "/browser-profile/Default1" # Ensure this path is correctly set and contains your configured extension
browser_config = BrowserConfig(
verbose=True,
headless=False,
user_data_dir=user_data_dir,
use_persistent_context=True,
proxy="http://127.0.0.1:13120", # Optional: configure proxy if needed
)
async def main():
async with AsyncWebCrawler(config=browser_config) as crawler:
result_initial = await crawler.arun(
url="https://nft.porsche.com/onboarding@6",
cache_mode=CacheMode.BYPASS,
session_id="session_captcha_test"
)
# The extension will automatically solve the AWS WAF upon page load.
# You might need to add a wait condition or time.sleep for the WAF to be solved
# before proceeding with further actions.
time.sleep(30) # Example wait, adjust as necessary for the extension to operate
# Continue with other Crawl4AI operations after AWS WAF is solved
# For instance, check for elements or content that appear after successful verification
# print(result_initial.markdown) # You can inspect the page content after the wait
if __name__ == "__main__":
asyncio.run(main())
Code Analysis:
user_data_dir: This parameter is essential for Crawl4AI to launch a browser instance that retains the installed CapSolver extension and its configurations. Ensure the path points to a valid browser profile directory where the extension is installed.time.sleep is included as a general placeholder to allow the extension to complete its background operations. For more robust solutions, consider using Crawl4AI's wait_for functionality to check for specific page changes that indicate successful AWS WAF resolution.If you prefer to trigger the AWS WAF solving manually at a specific point in your scraping logic, you can configure the extension's manualSolving parameter to true and then use js_code to click the solver button provided by the extension.
import asyncio
import time
from crawl4ai import *
# TODO: set your config
user_data_dir = "/browser-profile/Default1" # Ensure this path is correctly set and contains your configured extension
browser_config = BrowserConfig(
verbose=True,
headless=False,
user_data_dir=user_data_dir,
use_persistent_context=True,
proxy="http://127.0.0.1:13120", # Optional: configure proxy if needed
)
async def main():
async with AsyncWebCrawler(config=browser_config) as crawler:
result_initial = await crawler.arun(
url="https://nft.porsche.com/onboarding@6",
cache_mode=CacheMode.BYPASS,
session_id="session_captcha_test"
)
# Wait for a moment for the page to load and the extension to be ready
time.sleep(6)
# Use js_code to trigger the manual solve button provided by the CapSolver extension
js_code = """
let solverButton = document.querySelector(\'#capsolver-solver-tip-button\');
if (solverButton) {
// click event
const clickEvent = new MouseEvent(\'click\', {
bubbles: true,
cancelable: true,
view: window
});
solverButton.dispatchEvent(clickEvent);
}
"""
print(js_code)
run_config = CrawlerRunConfig(
cache_mode=CacheMode.BYPASS,
session_id="session_captcha_test",
js_code=js_code,
js_only=True,
)
result_next = await crawler.arun(
url="https://nft.porsche.com/onboarding@6",
config=run_config
)
print("JS Execution results:", result_next.js_execution_result)
# Allow time for the AWS WAF to be solved after manual trigger
time.sleep(30) # Example wait, adjust as necessary
# Continue with other Crawl4AI operations
if __name__ == "__main__":
asyncio.run(main())
Code Analysis:
manualSolving: Before running this code, ensure the CapSolver extension's config.js has manualSolving set to true.js_code simulates a click event on the #capsolver-solver-tip-button, which is the button provided by the CapSolver extension for manual solving. This gives you precise control over when the AWS WAF resolution process is initiated.Integrating Crawl4AI with CapSolver offers a robust and efficient solution for bypassing AWS WAF protections, enabling uninterrupted web scraping and data extraction. By leveraging CapSolver's ability to obtain the critical aws-waf-token and Crawl4AI's flexible js_code injection capabilities, developers can ensure their automated processes navigate through WAF-protected websites seamlessly.
This integration not only enhances the stability and success rate of your crawlers but also significantly reduces the operational overhead associated with managing complex anti-bot mechanisms. With this powerful combination, you can confidently collect data from even the most securely protected web applications.
Q1: What is AWS WAF and why is it used in web scraping?
A1: AWS WAF (Web Application Firewall) is a cloud-based security service that protects web applications from common web exploits. In web scraping, it's encountered as an anti-bot mechanism that blocks requests deemed suspicious or automated, requiring bypass techniques to access target data.
Q2: How does CapSolver help bypass AWS WAF?
A2: CapSolver provides specialized services, such as AntiAwsWafTaskProxyLess, to solve AWS WAF challenges. It obtains the necessary aws-waf-token cookie, which is then used by Crawl4AI to mimic legitimate user behavior and gain access to the protected website.
Q3: What are the main integration methods for AWS WAF with Crawl4AI and CapSolver?
A3: There are two primary methods: API integration, where CapSolver's API is called to get the aws-waf-token which is then injected via Crawl4AI's js_code, and Browser Extension integration, where the CapSolver extension automatically handles the WAF challenge within a persistent browser context.
Q4: Is it necessary to use a proxy when solving AWS WAF with CapSolver?
A4: While not always strictly necessary, using a proxy can be beneficial, especially if your scraping operations require maintaining anonymity or simulating requests from specific geographic locations. CapSolver's tasks often support proxy configurations.
Q5: What are the benefits of integrating Crawl4AI and CapSolver for AWS WAF?
A5: The integration leads to automated WAF handling, improved crawling efficiency, enhanced crawler robustness against anti-bot mechanisms, and reduced operational costs by minimizing manual intervention.
Compare AWS WAF vs Cloudflare CAPTCHA challenges. Learn how to solve AWS WAF and Cloudflare Turnstile for web automation with high success rates using CapSolver.

Looking for a cheap AWS WAF solver per thousand requests? Compare pricing, speed, and accuracy to find the best value for your automation needs.
